So much for that.
And let's do a quick recap again.
We introduced support vector machines.
Main idea being we're trying to find the line that maximally separates two classes of
points, more precisely by maximizing, well, this margin here basically.
The nice thing about that is that we only need to consider the points that are actually
on those dotted lines, i.e. those that are closest to the opposing class.
The way that that works is ultimately we take this magic equation here, we plug in the stuff,
and then we solve for these alphas by putting them into some library or other.
And the nice thing about that is that if we do that, then it turns out even though we
get one parameter per data point that we have, most of those parameters will turn out to
be zero anyway, i.e. all of those that are not the support vectors associated with the
support vectors, I should say.
This entire expression is convex, so we will find a global maximum.
And the really nice thing is that the data points that I have only occur in this equation
in the form of this particular scalar product, which is nice because we can exploit that
to do a neat little trick.
And that trick consists of finding a feature space, i.e. some other vector space that I
can project my data into such that it becomes linearly separable if it is not already in
the original space that I'm looking at.
For example, if I have a bunch of dots where the actual separator that I'm trying to find
is some kind of circle thingy, then what I can do is project my data into a three-dimensional
space that looks like this.
And then it turns out that I can replace the scalar product in the SVM equation by a kernel
function where a kernel function is just some function that corresponds to some inner product
on the feature space, i.e. some mapping into a different vector space, possibly much more
higher dimensional.
In there, I find some kind of inner product, usually also just like the scalar product,
apply that and then put that into the SVM equation instead of the normal scalar product.
And then the nice thing about that is that if I choose this kernel function correctly,
then it's relatively efficiently to compute, so I don't necessarily make the problem significantly
more complicated by picking a smart form of a kernel function, but conceptually I get
a much higher dimensional space where a linear separator can actually be found.
So what property does the kernel function need to have?
Well, it needs to be a symmetric function that happens to be positive definite on my
data set.
Anytime I do that, I can just plug it into the SVM equation and then I get a kernel function.
The popular ones, just as one example, are the polynomial kernels where I just do one
plus scalar product to the D for some value of D, and that gives me an exponential in
D dimensional feature space where ultimately the computation itself is just to the scalar
product and to the power of the D. So it's very efficient to compute, but in theory you
get like lots more dimensions where it's much easier to find a linear separator.
Apart from that, we talked about neural networks.
Most prominently or the most important model, well, at least the easiest model to understand
with respect to neurons in particular is the McCulloch-Pitts unit.
A McCulloch-Pitts unit wraps a scalar product between a weight vector and an activation
vector in some activation function, usually something like a threshold function or a logistic
function.
Those tend to be rather popular or like ReLU, which is just a threshold function.
Then concatenating nodes like these together gives us a McCulloch-Pitts network.
Presenters
Zugänglich über
Offener Zugang
Dauer
01:23:14 Min
Aufnahmedatum
2024-06-25
Hochgeladen am
2024-06-27 02:39:39
Sprache
en-US